Skip to content

[Models] add fleet model fallback#7732

Closed
xiaoguoguo626807 wants to merge 20 commits into
PaddlePaddle:developfrom
xiaoguoguo626807:fleet
Closed

[Models] add fleet model fallback#7732
xiaoguoguo626807 wants to merge 20 commits into
PaddlePaddle:developfrom
xiaoguoguo626807:fleet

Conversation

@xiaoguoguo626807

@xiaoguoguo626807 xiaoguoguo626807 commented May 7, 2026

Copy link
Copy Markdown
Contributor

Motivation

新增 PaddleFleet 作为模型推理后端(--model-impl paddlefleet),通过将 PaddleFleet TransformerLayer 中的 core_attention 替换为 FastDeploy Attention 内核,实现在 PaddleFleet 模型结构上复用 FastDeploy 的 KV Cache 和高性能 Attention 计算。

Modifications

  • config.py: 新增 paddlefleetModelImpl 类型定义
  • engine/args_utils.py: 支持 --model-impl paddlefleet CLI 参数,并补充校验逻辑
  • model_executor/models/paddleformers/base_fleet.py: 新增 PaddleFleetModelBase 基类、FastDeployAttention 层及 patch_paddlefleet_core_attention 替换函数
  • model_executor/models/paddleformers/__init__.py: 注册 PaddleFleetForCausalLM 模型类
  • test_fallback_fleet_model.py` 需要独立的 PaddleFormers 和 PaddleFleet 依赖,使用 pytest conftest.py 钩子机制,在测试运行时动态安装依赖,避免污染全局环境

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server \
    --model /path/to/model \
    --model-impl paddlefleet

Accuracy Tests

N/A(本 PR 新增 PaddleFleet 推理后端,尚未提供与参考实现的 logits 对齐数据)

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot

paddle-bot Bot commented May 7, 2026

Copy link
Copy Markdown

Thanks for your contribution!

@PaddlePaddle-bot

PaddlePaddle-bot commented May 8, 2026

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-02 18:39:01

CI报告基于以下代码生成(30分钟更新一次):
PR commit: 820864a | Merge base: cb2d7c0 (branch: develop)


1 Required任务 : 8/10 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
60(18) 42 37 5 0 0 0
任务 错误类型 置信度 日志
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage 不稳定问题:SHM 竞态导致 server 启动失败 Job
Approval 需要 Approval Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 不稳定问题(置信度: 中)

分析器: ci_analyze_unittest_fastdeploy | 错误类型: 不稳定问题 | 置信度: 中

失败用例:

用例 错误摘要
pooling/test_Ernie4_5_reward_serving 第二次 server 启动时工作进程找不到 cache_ready_signal.8458 共享内存

关键日志:

File "fastdeploy/worker/gpu_model_runner.py", line 281, in __init__
    self.cache_ready_signal = IPCSignal(
File "fastdeploy/inter_communicator/ipc_signal.py", line 110, in __init__
    self.shm = SharedMemory(name=name)
FileNotFoundError: [Errno 2] No such file or directory: '/cache_ready_signal.8458'
ERROR  api_server.py[line:146] Failed to initialize FastDeploy LLM engine, service exit now!
  • 根因摘要: 第二次 server(无 prefix caching)启动时 SHM 名称冲突。测试依次启动两个服务器(第一个带 --enable-prefix-caching,第二个带 --no-enable-prefix-caching),两次使用相同端口 8458。第一个 server 被 clean_ports() 强杀后,cache_ready_signal.8458 共享内存未被完整清理(进程被 SIGTERM 中断,未执行 ipc_signal.clear());第二个 server 启动时 SHM 清理逻辑存在竞态,导致 engine 创建的新 SHM 在 worker 尝试 open 时找不到。

修复建议:

  1. 已知不稳定问题(SHM 竞态),建议先 rerun 验证是否偶发
  2. 若持续失败,检查 tests/pooling/test_Ernie4_5_reward_serving.py 的 server 切换逻辑:server_default_caching fixture 缺少 yield + teardown,未能在切换前等待第一个 server 完全退出(含 SHM cleanup)
  3. 建议 fixture 改为 yield 模式,在 teardown 中等待 os.killpg + sleep(2) 后再启动下一个 server

关联变更: PR 未修改 gpu_model_runner.py / ipc_signal.py / common_engine.py,与本次失败无直接代码关联

🔴 Approval — 需要 Approval

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。请通过人工审批。

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter

codecov-commenter commented May 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 32.92308% with 218 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@cb2d7c0). Learn more about missing BASE report.

Files with missing lines Patch % Lines
.../model_executor/models/paddleformers/base_fleet.py 29.80% 207 Missing and 5 partials ⚠️
fastdeploy/model_executor/utils.py 0.00% 4 Missing ⚠️
fastdeploy/model_executor/models/model_base.py 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7732   +/-   ##
==========================================
  Coverage           ?   67.72%           
==========================================
  Files              ?      468           
  Lines              ?    65509           
  Branches           ?    10067           
==========================================
  Hits               ?    44365           
  Misses             ?    18299           
  Partials           ?     2845           
Flag Coverage Δ
GPU 77.93% <32.92%> (?)
XPU 7.04% <0.30%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@xiaoguoguo626807

Copy link
Copy Markdown
Contributor Author

/re-run all-failed

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-01 14:34:00

📋 Review 摘要

PR 概述:新增 PaddleFleet 作为模型推理后端(--model-impl paddlefleet),通过替换 PaddleFleet TransformerLayer 中的 core_attention 为 FastDeploy Attention 实现 KV Cache 复用。
变更范围model_executor/models/config.pyengine/args_utils.pygraph_optimization/decorator.pyscripts/tests/
影响面 Tag[Models] [FDConfig] [Engine] [Graph Optimization] [CI]

问题

级别 文件 概述
🟡 建议 fastdeploy/model_executor/graph_optimization/decorator.py:68 graph_opt_backend 不接受位置参数,*args 转发将导致启用图优化时崩溃

历史 Findings 修复情况

Finding 问题 状态
F1 pip 命令字符串拼接缺少空格 ✅ 已修复
F2 params_dtype 硬编码 bfloat16 ✅ 已修复(现使用 self.model_config.dtype or "bfloat16"
F3 同 F1 ✅ 已修复
F4 help 文本隐式拼接缺少空格 ⚠️ 仍存在
F5 PretrainedModel import 改为内部路径 ⚠️ 仍存在
F6 同 F1 ✅ 已修复
F7 layer_number 1-indexed vs Attention 0-indexed ⚠️ 仍存在
F8 load_weights 缺少日志 ✅ 已修复
F9 引用不存在的 test 文件 ✅ 已修复

📝 PR 规范检查

PR 描述结构合规,但 Checklist 勾选状态存在不一致:

  • [ ] Add unit tests:PR 已新增 tests/model_executor_fallback/test_fallback_fleet_model.py,应改为 [x]
  • [x] Provide accuracy results:Accuracy Tests 段填写 N/A,应改为 [ ](括号内注明原因即可)

标题建议(可直接复制):

  • [Models] Add PaddleFleet model fallback backend
PR 描述建议(点击展开,可直接复制)
## Motivation
新增 PaddleFleet 作为模型推理后端(`--model-impl paddlefleet`),通过将 PaddleFleet TransformerLayer 中的 `core_attention` 替换为 FastDeploy Attention 内核,实现在 PaddleFleet 模型结构上复用 FastDeploy 的 KV Cache 和高性能 Attention 计算。

## Modifications
- `config.py`: 新增 `paddlefleet``ModelImpl` 类型定义
- `engine/args_utils.py`: 支持 `--model-impl paddlefleet` CLI 参数,并补充校验逻辑
- `worker/worker_process.py`: 同步更新 `--model-impl` choices
- `model_executor/models/paddleformers/base_fleet.py`: 新增 `PaddleFleetModelBase` 基类、`FastDeployAttention` 层及 `patch_paddlefleet_core_attention` 替换函数
- `model_executor/models/paddleformers/__init__.py`: 注册 `PaddleFleetForCausalLM` 模型类
- `model_executor/graph_optimization/decorator.py`: 修复 `__call__` 支持位置参数(`*args`- `scripts/coverage_run.sh`: 新增 `isolated` 测试分类,将 fleet 相关测试置于最后运行
- `tests/model_executor_fallback/`: 新增 `conftest.py``test_fallback_fleet_model.py`

## Usage or Command
```bash
python -m fastdeploy.entrypoints.openai.api_server \
    --model /path/to/model \
    --model-impl paddlefleet
```

## Accuracy Tests
N/A(本 PR 新增 PaddleFleet 推理后端,尚未提供与参考实现的 logits 对齐数据;后续 PR 将补充对齐结果)

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

整体设计合理,通过 monkey-patch core_attention 实现对 PaddleFleet 模型的 KV Cache 复用。历史 Findings 中 5/9 已修复。建议优先处理 F4(help 文本空格)和 F7(layer_id 偏移)两个遗留问题,以及本轮发现的 decorator *args 兼容性问题。

return self.forward(*args, **kwargs)

return self.graph_opt_backend(**kwargs)
return self.graph_opt_backend(*args, **kwargs)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 graph_opt_backend.__call__ 仅接受 **kwargsGraphOptBackend.__call__(self, **kwargs)),此处转发 *args 会导致在 use_graph_opt=True 时抛出 TypeError

当前 PaddleFleet 模型已应用 @support_graph_optimization 装饰器,若用户配置开启图优化,调用链将触发此路径。

建议修复方式:

def __call__(self, *args, **kwargs):
    """Decorator model.__call__() func"""
    if not self.use_graph_opt:
        return self.forward(*args, **kwargs)
    # graph_opt_backend 仅支持 kwargs
    return self.graph_opt_backend(**kwargs)

或者在 GraphOptBackend.__call__ 中同步支持 *args

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants